Index


RISC World

HTML to DTP

Dave Holden

Note. There have been minor improvements and bug fixes to the !Imp-HTML and !Ovn-HTML programs published in Issues 4 and 5. The latest versions of these programs can be found in the directory SOFTWARE.ISSUE5 on this CD

HTML to DTP

This is a program to covert a straightforward HTML file to either plain text or into a form suitable for loading into Impression or Ovation Pro. It is not intended to work with complex files with intricate layouts and tables, but rather with HTML 'documents' such as on-line manuals and texts.

One reason for doing this is that although it is possible to print the original HTML file from your browser you will then be obliged to accept it in the form in which it appears in the browser window. This may not be ideal. By loading it into your DTP program instead you can alter the fonts, point size, page layout, and other factors to make it more convenient to view and print.

Using HTML-DTP

Run the program by double-clicking on it in the usual way. To open the main window either click SELECT on the iconbar icon or select 'Process' from the iconbar menu. The window shown will open.

On the right hand side are three buttons which enable you to choose between converting to Impression RTF (Rich Text Format), Ovation Pro DDL (Document Description Language) or Plain text. As you click on these buttons the file icon at the bottom will change between a text icon (for Impression and Text) and the file icon for Ovation Pro DDL. Similarly the filename in the Save icon will change to the defaults for the respective filetypes, namely TextStory, DDLfile and TextFile.

To convert a file drag the HTML file you want to convert to the icon at the top labelled 'Source file'. It must have the correct RISC OS filetype for HTML (&FAF) or it won't be accepted. It's name will appear in the icon and the file icon at the bottom of the window, which will initially be 'greyed out', will change to a normal appearance. All you have to do now is select the format you want to convert to and drag this file icon to a suitable directory on your hard drive, altering the name first if you wish.

Loading a file into a DTP program

A plain text file can be double-clicked on to load it into your text editor. Similarly a DDL file will normally load directly into Ovation Pro provided the program has been Run or 'seen' by the filer. However, to load a RTF file into Impression you will have to either drag it to the Impression icon on the iconbar or into a blank document window.

With an Ovation Pro or Impression document any headings, bold and italic text should be properly converted. Ordered and unordered lists will be correctly indented. Unordered lists should have each new item marked with a bullet character and an ordered list have each item preceded with a sequential number.

Obviously plain text can't have the typeface changes, but lists will be shown, although they won't be indented as this would make it more difficult for you to format the text subsequently if you wish to do so.

Pictures and Links

If there are any links to pictures or other files in the HTML these links can be included. The pictures themselves are not inserted into the file, but by including the links you can, if you wish, convert any pictures into a suitable format and then place them in the DTP document in the usual way at the position indicated.

To have these links included you should select the 'Show links' or 'Show image links' icons at the left of the window. With a plain text file a file links is shown in the form:

     --<< Link to "name" >>--

and an image link in the form:

     --[[ Image "name" ]]--

where in both cases "name" is the name of the file or image linked to.

With Impression or Ovation Pro instead of the enclosing '-<< >>-' and '-[[ ]]-' braces a file link is shown in green text and an image link in red.

Marking links in this way enables you to quickly identify them and delete any that are unwanted.

Title

Most HTML documents are given a title which is shown by the browser. If you select the 'Include Title' icon then this will be shown at the start of the document. In Impression and Ovation Pro this will be in blue text.

Paragraph spacing

Text in HTML documents is normally unformatted, just as in wordprocessors, with markers at the ends of paragraphs. This is usually done by placing a <P> tag at the start of each paragraph and a </P> tag at the end of each paragraph. In a browser paragraphs marked in this way will have a definite gap between them.

In addition the <BR> tag is used to indicate a line 'break'. Unlike the <P> tags there would not be a gap between lines terminated in this way.

Sometimes only the <BR> tag is used, and sometimes the <P> tag is used without the corresponding </P> tag. To accommodate these variations the 'Double <BR>' and 'Double <P>' buttons, when selected, will insert extra line breaks when <BR> and <P> tags respectively are found, effectively inserting extra blank lines between each paragraph.

This may result in too many blank lines, but these can easily be seen and deleted. This is preferable to having no obvious breaks between paragraphs as this could make the text difficult to read or edit.

How you set these options will depend upon how the codes are used in any particular file and your own preferences. Some experimentation may be required to find the best combination for any particular circumstances.

Saving Choices

To avoid the need to set all these buttons each time you start !HTML-DTP you can select 'Save Choices' from the iconbar menu. The settings will be saved in a choices file inside the application directory and set the next time you start the program.

Fonts and Styles

When you look at a converted file in Ovation Pro or Impression the various headings and indented lists are converted to appropriate styles. These can be seen on the Styles menu. For example, after loading a converted DDL file the Style menu in Ovation Pro would look like this.

The Style menu in Impression would be almost identical except that the top item (the 'base' style) would be called 'Normal' instead of 'Body Text'. Also some additional styles might appear as the styles in the converted document are added to any already present in the default setup.

As you might imagine the styles 'Heading 1' to 'Heading 6' are applied to the HTML heading styles <H1> to <H6>. 'List 1', 'List 2' and 'List 3' are used for <UL> and <OL> structures. 'List 1' is the normal style, 'List 2' is used where one list is embedded in another, and 'List 3' is used for a further embedding. The only difference between these styles is that each is indented further from the left side of the page to produce an effect similar to that seen in HTML.

The six Heading styles all use the font Homerton Bold in descending point sizes. All the others use Trinity Medium in 12 point except for 'Code' which uses the font Corpus. This is employed to mimic text enclosed in <PRE> tags in HTML.

Changing the styles

You may find that the default styles used by HTML-DTP aren't suitable, or that you simply don't like them. These are all set by template header files, so they are very easy to change.

Inside the !HTML-DTP application directory there is a sub directory called Resources. In this you will find two files, a text file name Impression and a DDL file named OPro. If you drag the appropriate file to your DTP program it will be loaded and you will be able to edit the various styles in the usual way.

Because the various 'List' styles are based on the 'Normal' or 'Bodytext' style any changes made to the base style will be reflected in these. If you do change the base style, especially if you increase the point size, you may need to alter the ruler for the 'List' styles. This is because the first Tab position and the left-hand wrap point for the text are set to just accommodate a two digit number and a full stop, such as used when numbering items in the list. If you increase the point size without moving these points on he ruler you could get an unsightly gap after double digit item numbers.

Once you have done this the file can be re-saved. Don't forget to delete any text you may have entered when trying out your altered styles first otherwise this text will appear in all converted documents.

When editing the styles it is best if you delete any that aren't actually used by HTML-DTP. For example, the default Impression document has the styles Main Heading, Sub Heading, 1in indent, Hanging indent and Table. There won't be any problem if you don't remove them, but they will clutter up the Style menu on converted documents and they won't be used.

To save the file with Impression open the 'Save text story' window from the File menu as shown below.

Make sure that you save it with the filename 'Impression' or it won't be recognised by the program and check that 'With styles' is ticked (as shown above) or all the important information won't be saved. The process with Ovation pro is very similar. Open the 'Save as' window as usual from the File menu. Before saving the file check that the name is 'OPro' and that 'DDL' is selected as shown.

Before editing either of these template files it would probably be wisest if you made a backup copy of the original first in case something goes wrong.

Dave Holden

 Index